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INTRODUCTION 


A users’ workshop for CARE III, a reliability assessment tool designed and 

* developed especially for the evaluation of high-reliability fault-tolerant digital systems, 
was held at NASA Langley Research Center in Hampton, Virginia on October 6 and 7, 
1987. 

The main purpose of the workshop, sponsored by co-chairmen Salvatore J. 
Bavuso (NASA-LaRC) and Anna L. Martensen (PRC Kentron, Inc.), was to assess the 
evolutionary status of CARE III — or, as Sal Bavuso put it, "where do we go from 
here? " 

The workshop opened with Chuck Meissner, branch head of the Systems Valida- 
tion Methods Branch, welcoming the 19 attendees from 13 different companies and giv- 
ing them an overview of NASA-LaRC. Sal Bavuso followed with an introduction and 
history of CARE III, with Roberto E. Altschul of Boeing Electronics Company discuss- 
ing its mathematical theory. The rest of the first day and half of the second day were 
devoted to discussions and presentations by attendees and members of NASA-LaRC. 
Features and limitations of CARE III, and comparison to other tools were the main 
topics. Copies of the presentations follow this introduction. 

A tour of AIRLAB began on the second day with an overview of the facility by 
Chuck Meissner. The attendees alternated among three different stations in which 
there were discussions and demonstrations of the Semi-Markov Unreliability Range 
Evaluator (SURE) by Ricky Butler, Fault Injection by George Finelli, and Software 
Reliability by Jon Sjogren. The final hours of the workshop were devoted to hands-on 
demonstrations and tutorials of CARE III and HARP. 

The results of a questionnaire filled out by the attendees helped to address recom- 
mendations for future enhancement of CARE III. A summary of the responses follows: 

1. Weibull is frequently used for long mission times. 

2. Some users compute steady-state availability with CARE III. 

3. Many users do both steady state and instantaneous availability. 

4. Most users believe CARE III can model very large systems and most don’t know of 
another program that can handle fault tree applications. 

5. All users consider CARE III to be valuable enough to warrant continued develop- 
ment. 

6. A number of components in fault-tolerant systems can vary from about 15 to 
3,000, depending upon the complexity of the system. 

* 7. Most execution times for CARE III range from 5 to 10 minutes. 

8. Most users responded "confident" to "very confident" in using CARE III (none 
responded "skeptical" or "not-confident"). 

9. Testing of CARE III is thought to be adequate to very well. 
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10. One user discovered a method of approximating the effects of state dependent 
coverage. 

11. One user suggested using the Weibull distribution to approximate dormant 
spares. 

12. All the stated applications are aerospace. 

13. Sequence and state dependency modeling are common. 

14. Some users typically use the CARE III FEHM and think it is not powerful enough, 
while others say it is too complicated and don’t use it. 

15. No user suggested a method of doing sequence dependency with CARE III. 

16. Some use the CARE III menu and some don’t. 

In conclusion, the CARE III program is considered to be very flexible and useful. 
A recurrent theme throughout the workshop, however, was the suggestion that one 
should use more than one tool at a time when analyzing complex systems. An overall 
interest in CARE III was indeed displayed by all of the attendees. 
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THE CARE m THEORY IMPLEMENTATION 

AND CODE 


October 6, 1987 


Aina L. Martensen 
PRC Kentron, Inc. 
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The Fault Occurrence/Repair Model 
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Fault/Error Handling Model 
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SFHMNAMES 

FHMNAME( 1 )= '(NONE)', 
FHMNAME( 2 )= 'PERMANENT C' , 
FHMNAME( 3) = 'PERMANENT B' 

$END 

$FLTTYP 


NFTYPS= 

3, 




ALP= 

0.0 

, 0.0 

9 

0.0 

BET= 

0.0 

, 0.0 

9 

0.0 

DEL= * 

3600.0 

, 360.0 

9 

10000.0 

RHO= 

0.0 

, 180.0 

9 

0.0 

EPS= 

0.0 

, 3600.0 

9 

0.0 

IDELF= 

1 

, 1 

9 

1 

IRHOF= 

1 

, 1 

9 

1 

IE?SF= 

1 

, 1 

9 

1 

MARKOV* 

■ 1 

t 



PA= 

1.0 

, 1.0 

9 

1.0 

PB= 

1.0 

, 0.0 

9 

0.0 

C= 

1.0 

, 9.990000E- 

-01, 

1.0 


LGTMST=T 

SEND 

SSTGNAMES 

STGNAME( 1 ) = 

STGNAME( 2 ) = 

STGNAME( 3 ) = 

STC-NAME( 4 ) = 

STGNAMEf 5) = 

SEND 

SSTAGES 

NSTGES-5, 

N = 3, 3, 

M = 2, 2, 

NSU3= 0 , 

MSUB= 0 , 

LC= 0, 0, 

NOF ■' 1,3) = 3, 

NOP 1 1,5) = 3, 

IRL?CD=1 , 

RLPLOT=F , IAXSRL=2 

SEND 

SFLTCAT 

NFCATS=1 , 1 , 1 , 1 , 1 , 
JTYP(1,1)= 1, 

JTYP( 1,2)= 1, 

JTYP( 1,3)= 2, 

JTYP( 1,4)= 1, 

JTYP( 1,5)= 3, 

OMG' 1,1)= 1.0 

OMG ' 1,2)= 1.0 

0MG<1,3)= 1.0 

OMGt 1,4)= 1.0 

OMG (1,5)= 1.0 


' INERTIAL REF', 
'PITCH RATE', 
'COMPUTER' , 
'SECONDARY ACT', 
'COMPUTER BUS' 


4, 3, 4, 

2 , 2 , 2 , 

0 , 0 , 0 , 0 , 
0 , 0 , 0 , 0 , 
0 , 0 , 0 , 



RLM(1,1)= 1.500000E-05, 
RLM( 1,2)= 1.900000E-05, 
RLM(1,3)= 4.800000E-04, 
RLM( 1,4)= 3.700000E-05, 
RLM(1,5)= 2.700000E-06 

$END 

$RNTIME 

FT= 10.0000 , ITBASE=1 , 

PSTRNC= 0 . 100000E-09 , 
QPTRNC= 0.100000E-01, 
NPSBRN=20 , 

CKDATA=T , 

SYSFLG=T , CPLFLG=T 

SEND 

SYSTEM TREE EX 7 
15 6 6 

6 0 1 2 3 4 5 

CRITICAL PAIRS TREE EX 7 
1 8 9 18 
3 14 
5 5 8 

9 0 2 3 

10 0 1 3 

11 0 12 

12 A 9 5 

13 A 10 6 

14 A 11 7 

15 2123 

16 2567 

17 0 12 13 14 

18 O 15 16 17 
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The CARE III Program 
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INTRODUCTION TO THE 
CARE m MATHEMATICAL MODEL 

October 6, 1987 

Roberto E. Altschul 
Boeing Electronics Company 
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FAULT TOLERANCE (PERFECT FAULT-HAKDL I H6 ABILITY TO HASK FAULT ORTIL DETECTION 
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nOMJLE: BASIC UBIT 

MDEPERDEAT FAILURES 
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SYSTEfl FAILURE : AODULE EXHAUSTIOA 
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$TA«£ FAILURES 




CLASSICAL STOCHASTIC MODEL 

OPERATIONAL STATES 
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MODULE : BASIC OBIT 

INDEPENDENT FAILURES 
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SYSTEfl FAILURES 
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A: ACTIVE 

8: BENIGN t « tlac from entry Into 

0: DETECTED active state A 

E: ERROR 

F: FAIL URE t “ tlae fro* entry Into 

F: DETECTED AS PERMANENT error state E 

(MON-TRANSIENT) 


Single Fault Coverage Model 
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General Structure of CAKt 111 A*fr« 9 «te Model 
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RTFS USE OF CARE in 

October 6, 1987 

Charlotte 0. Scheper 
Research Triangle Institute 
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OUTLINE 


• Comparative study of CARE III and ARIES 82 
— Objective 
— Technical approach 

CARE III assessment 

— ARIES 82 assessment 


• Comparison of on chip versus off chip redundancy 


• Integration of performance and reliability tools 


RTI 
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COMPARATIVE STUDY OF CARE HI & ARIES 82 

OBJECTIVES 

• Determine suitability for AIPS analysis 
. Compare CARE III and ARIES 82 


/Ml 
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TECHNICAL APPROACH 


• Obtain and review AIPS architecture information 

Objectives and requirements 

— Features and building blocks 

Impact on reliability assessment 

• Overview of CARE III and ARIES 

• Apply CARE III and ARIES to problems 

Problems selected to demonstrate relative 

strengths and weaknesses 

— Problems selected so that solution could be 
obtained by standard analysis techniques 
based on Markov model 
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CARE III ASSESSMENT 
OUTSTANDING FEATURES 


Can handle large systems 

Flexible fault handling model 

Can have non-constant fault occurrence rates 

Well tested and verified 


RTI 



CARE m ASSESSMENT 
APPLICABLE SYSTEM CHARACTERISTICS 


• Best suited for systems where: 

— The mission time is short relative to the time 
between failure occurrences 

— The fault recovery time is short relative to the 
time between failure occurrences 

— Either the network reliability cannot impact system 
reliability or the network can be treated as an 
independent subsystem whose reliability can be 
determined by other means 

— Near coincident multiple faults of order greater 
than two are not relevant 

— System reliability should be in the extremely to 
ultrareliable regime 


RTI 
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CARE III ASSESSMENT 
POTENTIAL LIMITATIONS 

• The fundamental assumption that sojourn times 
in the fault handling model are small relative 

to the time between fault occurrences may not 
be valid for latent faults or for some intermittent 
faults 

• The fault handling models used are independent 
of system state 

• The fault handling model is constrained 
— To a single entry state 

— To have identical transition rates ( a . 0 ) 
between active and benign for faulted 
and error-producing states 
— Transitions between some states of the 
model are omitted 

• The double fault model is conservative 

— A system failure results if two critically 
coupled faults occur even though neither 
has produced an error 


RTI 
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ARIES ASSESSMENT 
OUTSTANDING FEATURES 


• The capability to model closed or open systems 

• Spare modules can have failure rates that are 
different than active module failure rates 

• A state transition matrix can be used to 
describe a system 

• An interactive user interface 


RT1 
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ARIES ASSESSMENT 
POTENTIAL LIMITATIONS 


• Instantaneous coverage may not be adequate for modeling 
some systems 

• Constant failure rates are not adequate for modeling 
certain components of aerospace systems 

• System sizes are limited to relatively small systems 

• The accuracy of the results are suspect for highly 
reliable systems 

• The eigenvalues of the state transition matrix 
must be distinct 


RTI 
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INTEGRATION OF PERFORMANCE 
AND RELIABILITY TOOLS 

PURPOSE 


To develop an integrated set of tools to assist 
the system architect in the design of high-performance, 
highly reliable systems. 


RTI 
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TOOL DEVELOPMENT GOALS 


Tools for building an interrelated description 
of the system and its mission 

— Mission scenarios 

— System software 

— System hardware 

Tools for making tradeoffs between 
different system requirements 

— System throughput 

— System response time 

— System reliability 


Tools for maintaining consistent models 
of the system 

Reliability models 

— Hardware models 

— Software models 

— Fault models 




PHASE I 


• Build paradigm model 


• Analyze paradigm model 


• Define methodology 


ZED 
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PHASE II 


• Use methodology to revise the paradigm 

• Specify tools to support the methodology 

• Find existing tools which support the methodology 


/RTI 
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PHASE in 


• Design interfaces between existing tools 

• Build and test interfaces between tools 

• . Beta test the integrated tool sets 

• Demonstrate integrated tool sets 


RT1 
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POSSIBLE INTERFACE BETWEEN 
ADAS AND RELIABILITY ANALYSIS TOOLS 
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SUMMARY 


• Areas of applicability 


• Role in integrated tools 


RTI 
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USE OF CARE HI FOR 
FLIGHT CONTROLS DEVELOPMENT 
AT NORTHROP 


October 6, 1987 


Jack Flynn 

Northrop Corporation 
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NORTHROP FLIGHT CONTROLS DEVELOPMENT 
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CAgEIlI EXPERIENCE 
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CARE III ENHANCEMENTS 
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EXAMPLES OF NONCONSERVATIVE RELIABILITY 
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This prompted a 4-way comparison among reliability analysis tools - 
SURE, CARE III, PAWS, and STEM 
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uses Taylor series expansion techniques in calculating tne matrix 
exponential needed to solve the system of equations used to determine 
the death-state probabilities of a pure Markov model 
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CARE Ill's Fault-handling Model 



9 



CARE Ill's Fault-handling Modal Simplified 





Comparison of SURE, PAWS, STEM, and CARE III for Exampls 
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Triad with Transient Faulta 



Comparison of SURE, PAWS, STEM, and CARE III for Exampla 2 


vO 

O' 


VO 

Ov 


N 

I 

4 > 


CM 


Ov 

O 

i, 

CM 

CM 


I 

A 

CM 

VO 

O 


Ov 

O 

I 

4 ) 

VO 

in 

in 

ov 

CM 


• 

4 > 

m 

in 

r* 

Ov 

VO 


1 

u 

2 to 

£ 

£ 

> 

•*■4 

o 

•—4 

-8! 
•*4 4 
*— • ( 

m 

u 

§ 

8 


r- 

o 


m 


vo 

vo 


av 


vo 


r- 

o 

I 

o 

m 


vo 

vo 


Ov 

O' 

0> 

in 


cm 

I 

ti 

r- 

a> 

av 

Ov 

m 


av 

o 

l 

4) 

VO 

PO 

03 

in 

o 


av 

o 

i 

&> 

vO 

m 

co 

in 

o 


m 


l 

4 > 

CM 

CM 

av 

av 

o 

vo 


i 

QJ 

CM 

CM 

av 

av 

o 


in 


av 

o 

I 

4 ) 

in 

vo 

TT 

av 

m 


av 

o 

i 

4 » 

in 

vo 

mt 

av 

in 


I 

4 > 

o 


av 

av 


in 


vo 


4 ) 

O 


av 

av 


m 


in 


1 


o 

VO 


m 

vo 


m in m 
o o o 

I 4 ♦ 

4 > 4 > 4 / 


I « 

X «C 


Ov 

av 

as 

m 


cm 


av 

av 

in 


co r* cm 
o o o 
1 

4 > 


tt 


Ov 

O 


O 

o 

VO 

o 


av 

0 

1 

r~ 

CM 

in 

o 


o 

I 

4 * 


I 

a> 

r- 

av 

Ov 

av 

o 

vo 


I 

s 

o 

O' 

CM 

av 


av 

o 

I 

4 ) 


av 

m 


av 

o 

i 

4 ) 

Ov 

03 

mt 

in 

in 


l 

v 

o 

o 

in 

av 

av 

in 


i i 

-< *o 


• 

• 

• 

• 

«— 4 

m 

H 

in 





4 T r—i 

vO m cm 

vno 

m cm in 

o o 

O O O 

o o o 

o o o 

+ I 

1 + 1 

1 + ♦ 

l ♦ t 

Of 4 ) 

4 > 4 > O 

4 J 4 > 4 > 

4 > 4 ) 4 > 


r 4 H H 

H r 4 iH 

r 4 

1 1 

a • i 

ait 

lit 

*o 5 

.<*0 0 


.<*0 0 


100 


OF 


POOR QUAUXtf 


NOTE: The fault-handling Bedel was designed for fast transients and interaittents, where a and fi would 
be within 2 or 3 orders of magnitude of the other fault-handling parameters. Slower transients and 
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Comparison of SURE, PAWS, STEM, and CARE III for Example 
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Whenever possible, compare estimates 



HARP 

October 7, 1987 

Salvatore J. Bavuso 
NASA Langley Research Center 
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Fourth Generation Computer-Aided 
Reliability Engineering Tool 


Codeveloped by Duke University, LaRC, and 
Clemson University 



108 




HYBRID AUTOMATED RELIABILITY PREDICTOR (HARP) 
A FLEXIBLE RELIABILITY ESTIM ATIOM TOOL 






COMBINED ANALYTIC SIMULATIVE APPROACH 
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Analytic Solution 

- To predict system reliability 


HARP 

(HYBRID AUTOMATED REUABMJTY PREDICTOR) 
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RELIABILITY PREDICTION 






EXAMPLE SYSTEM 
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FAULT OCCURRENCE AND REPAIR MODEL (FORM 




FAULT OCCURRENCE AND REPAIR MODEL (FORM) 
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FAULT/ERROR HANDLING MODEL (FEHM) 
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Near Coincident Single Point 



FAULT/ERROR HANDLING MODEL (FEHM) 


I 






118 


Neor Coincident Single Point 

Poult Failure 




Probabilities 


P|R (°°)PR (t) 


nd Distributions 


Fault Occurs 


Pic (®)Pc(t) 


Pis (®)PrW 
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COMBINED FORM/FEHM MODEL 

Markov Chain Rapraaantation 
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COMBINED FORM/FEHM MODEL 

Markov Chain Rapraaantation 
3 Procaaaor / 2-Bua Syatnm 
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UNRELIABILITY OF 3-PROCESSOR / 2-BUS SYSTEM 

(With Sensitivity Bounds) 

s— +5% Failure Rat# 
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COMPARISON OF TOOLS 
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NO CROSS-COMPARISON OF THE THREE PROGRAMS IS ATTEMPTED. 
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Parametric analysis capability 
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Computationally fast tools (less than 24 cpu hours for very large models) 
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Figure 1. Intermediate state model 
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The CARE III Welbull capability Is limited by the constraints 
listed above: no repair or state dependent transtlons. 


(2) ( 1 ) The program provides a reasonably flexible Markov capability, 

especially for systems that can be decomposed to FORM and FEHM 
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HARP input files are easily edited by hand 



Computation of 
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models can be modeled 
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(Model continues) 
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DELTA (Recovery) 



Figura 3. Cold spare model 
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Figure 4. Intermittent fault model 
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5KI0INAE PAGE X3 
OF POOR QUALITY 


applied to three advanced fault tolerant systems as a check on the model r s 
flexibility and accuracy. Requirements for a user-oriented computer 
program which embodies the CARE III reliability modeling technique, will 
also be generated. In the second phase, the CARE III model user-oriented 
computer program shall be written and tested. 

2.0 STUDY OBJECTIVE 

The objective of this study is to develop a general reliability 

assessment technique which is capable of estimating the reliability of 
+ a broad class of fault-tolerant comput er and digital flight control 

systems . In its final form, the advanced technique will be implemented 
as a user-oriented computer program. 

• 3.0 CONTRACTOR TASKS 

The contractor shall perform the following tasks in two phases 
of work: Phase I shall include task 3.1 ( establish the requi re- 

ments fo r a ge neral reliability a ssessment tool), task 3.2^(develop the 
CARE III reliability modeling techn ique), and task S.S^ demonstrate and 
validate the CARE III technique). Pha^e II shall include task 3.4 
~ (generate a user-oriented CARE III computer program). 

3.1 Requirements for General CARE III Capability 

3.1.1 General CARE Assessment Technique 

The contractor shall determine the requirements for a general 
CARE assessment technique. The selecte. requirements should be broad 
enough to encompass both present and anticipated digital computer. 
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flight control systems, and assessment techniques. Detail examina- 
tion of these systems and techniques should provide a sound basis 
for determining the requirements. A partial list of requirements 
is identified in 3. 1.1. 2. The union of these listed requirements and 
those generated by the contractor should establish a sufficient set of 
requirements. 

3. 1.1.1 The contractor shall study at least the 
fo lowing techniques for determining the requirement s: CARE II 
(Computer-Aided Reliability Estimation). CAST (Combined Analytic 
Simulative Technique) , .CABSPA (Computer-Aided Redundant System 
Reliability Analysis), the SIFT (Software Implemented Fault-Tolerant) 
Computer Semi -Markov Technique, the Markov Technique utilized to assess 
the CSDL (Charles Stark Draper Laboratory) parallel Hybrid Multi processor , 
the modeling techniques utilized for the U.S. Air Force’s Fault-Tolerant 
Spaccborne Computer IfTSC) . The contractor shall also examine the 
following computer systems in determining CARE III requirements: 

ARCS, (Advanced Reconfigurable Computer System), SIFT. CSDL Parallel Hybrid 
Multiprocessor, the FTSC, and the l-ft.'CS (whole l.'ord Computer System). 

*1 - 2 The contractor s hall consider at least the 
following items for inclusion in the requirements: 


+ o 
o 

- o 
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OF POOR QUALITY 


3. 1.1. 2.1 Relia bility Structural Model 
coverage parameters 

Poisson and non-Poisson transient fault parameters and/or 
models with renewal (self-repair) 

Poisson and non-Poisson software fault parameters and/or 
models 

Poisson and non-Poisson hardware fault parameters and/or 
models 
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«f o general fault redundancy management capability and systems 
success criteria 

4 . o utilization of standard reliability models such as TMR and 
stand-by 

4-. o functional dependency between stages 
— o sensitivity analysis capability 

? O amenable to validation 

.3.1-1 -2.2 Cove rage Model 

o cover all important fault classes (e.g. hardware, sofjfkare, 
transient, and latent) and coverage enhancement mechanisms (fault ' 
detectors, isolators, and reconfiguration schemes) 

+ o compatible with the general reliability structural model 

(3. 1.1. 2.1) 

/ 0 uses available or measurable input data 

4- o time dependent coverage 

? O amenable to validation 

3.1.2 General CAP.E Computer Program Requirements 
The contractor shall determine the requirements for the computer 
program that shall implement the General \CARE technique and shall 

include at least the following: 

4 . o user-oriented (batch or interactive) 

^ o easy to operate, set up, and manipulate 

4 , o reasonable operational costs and accuracy 

3.2 Development of a General Reliability Technique (CARE III Technique) 
3.2.1 CARE III Model 
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3. 2. 1.1 The contractor shall examine various modeling 
techniques to meet' the requirements established under task 3.1. The 
methods to be considered shall include, but shall not be limited to, 
direct extensions of the CASE II model with or without additional 
constraints, use of Laplace transforms to eliminate multiple integrals, 
and a generalized Markov-chain analysis based on the Chapman-Kolmogorov 
constraint. 


3. 2. 1.2 The CARE III Model shall accommodate the 


following: 

o number of stages shall be at least 40 

o number of, modes shall be N, where N is user assignable. 
ti - /•’»* f* » i * <*• #. jm _ , 2 

This capability shall allow multiple dependencies across stages. n "" *' 

o a non -recoverable module transient should not cause 
system failure, but shall be treated as a "leaky” transient as a user 
option. 

3.2.2 CARE III Coverage .Vodel 

-r* 3. 2. 2.1 The contractor shall enhance the CARE II 

coverage model to meet the requirements established under task 3.1. 

— 3. 2. 2. 2 The contractor shall identify and develop 

techniques to simplify the task of acquiring data for the coverage model. 

3.3 Demonstration and Validation of the CARE III Technique 

“t” The contractor shall assess at least the ARCS, SIFT, W.-.'CS, and Parallel 
Hybrid Multiprocessor Computer/Flight Control Systems utilizing the 
CARE III technique and compare these results with published assessment 
results for these systems. 

3.4 User-Oriented General Reliability Estimation Computer Program (CAR 
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interconnected by seven different bus networks (address bus, 
data bus, control bus, power bus, timing bus, interrupt bus, 
status bus) . Each of these elements and buses is provided 
with redundant spares, in various conf igurations depending 
upon its complexity. (One element, the memory module, is 
itself internally redundant as well.) 

The current FTSC reliability model is a simplified, one 
mode, sixteen-stage version of CARE II. In some cases, non- 
unity dormancy factors were used to account for the lower 
failure rate of inactive and unpowered modules. 


2-2 CARE_jTI^EQUXREMENTS 

The emphasis in the previous section was on the techniques 
used to estimate the reliabilities of the systems in question. 

At a minimum, CARE III must provide a unified model for all 
four of those systems and hence reproduce, under the appropriate 
set of conditions, the results obtained using each of these 
models. This, of course, is a necessary but not a sufficient 
condition to place on CARE nr. To be most useful, it must be 
flexible enough to overcome any limitations imposed by the 
above models Ce.g., restrictive coverage models, limited fault 
models, etc.) and at the sane tine sufficiently general to 
al.ow other, as yet unspecified, fault-toleranc systems to be 
modeled without introducing artificial restrictions. The 
following paragraphs outline the requirements imposed on 
CARE III and explain the rationale for each of these require- 
ments in terms of the atcve objectives. 

1 * Capability of modeling up to at least 40 sta ges . 

Rationale: This is specified in the CARE III Statement of 

Work. Although none of the systems considered in paragraph 2.1 
require as many as 40 stages, it is not difficult to conceive 
of systems that do. This requirement will be satisfied in 
CARE III by providing a means for concatenating independent 
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runs. If the coupling between stages is limited, 
it will in fact be possible to model an arbitrarily large 
number of stages by making repeated runs. 


2. Multi ple operating modes for each set of coupled 

stages . Tt^R 

Rationale: The operating mode of a system or subsystem 

is, so far as its reliability model is concerned, a function 
of its structure (number of units of various types that have 


to be operational for the system to function as specified) and 
its coverage parameters. If the system's structure or coverage 
coefficients change stochastically during its operating life- 
time (e.g., if they depend upon the number of faults already 
incurred) such changes must be reflected in its reliability 
model. If a mode change in one stage precipitates a mode change 
in some other stage, the two stages are said to be coupled. 
(Deterministic structural or coverage parameter changes must. 


of course, also be reflected in the reliability model. Such 


changes are relatively easily accommodated, however, by 
introducing time— dependent coverage parameters and by concatenat- 
ing reliability models representing the disjoint time intervals 
during which the system structure is invarient. Thus, such 
mode changes impose no new constraints provided only that the 
coverage parameters are allowed to be t ime -dependent . ) 

CARE II allowed only one mode change (two operating modes) ; 
the exhaustion of the spares available at any one stage could 


cause the system to change from, say, a dual-redundant to a 
single-string configuration, thereby changing both the system 
structure and the coverage coefficients associated with each 
stage. Two of the systems discussed in paragraph 2.1, however. 
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(SIFT and ARCS) exhibited mode changes after each new fault. 
Thus, the two-mode limitation of CARE II is not acceptable for 
CARE III. 

*f 3 * Separate coverage model similar to that in CARE II 
but capable of handling latent and intermittent faults as well 
as permanent faults. 

Rationale: The major advantage in keeping the reliability 

and coverage models distinct (as they were in CARE II) is 
that it allows the user to concentrate on each of these two 
areas relatively independently and hence simplifies the task 
of defining the system model. In addition, there are some 
significant practical advantages (cf. Section 4) in separating 
the reliability model, driven by infrequently occurring 
failures, from the coverage model reflecting the much more 
rapid detection, isolation and recovery events. 

The need to handle both intermittent and latent faults in 
the coverage model is evident from the discussion in paragraph 
2 . 1 . 


-J- 4 . Multiple success criteria 

Rationale: As ARCS clearly demonstrates, some redundant 

systems may be considered operational under any one of a number 

of possible conditions. It is therefore necessary for the user 

to be able to define each of those conditions and for CARE III 

to calculate the probability that at least one of them occurs. 

>1 - I, L 

4“ 5. n-point failure mechanism s ("category 3" faults) 

Rationale: Most fault-tolerant systems exhibit "n-pomt- 

failure" mechanisms; i.e., sets of n failures (n > 1) that can 
disable the system even though spare hardware is available. 

If two BGUs fail in the enable mode in the FTMP, for example. 
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the system is potentially inoperative even though spare opera- 
tional modules are available. CARE II modeled such failure 
mechanisms only for n = 1. Although the probability of such 
failures is generally a rapidly decreasing function of n, it 
cannot a priori be considered negligible for all n > 1. The 
concept of a single-point failure must therefore be generalized 
to take this into account. 

6 • .Time -dependent hazard rates 

Rationale: All of the reliability models considered in 

paragraph 2.1 assumed constant hazard rates. There ere at 
least two reasons why it would be desirable to relax this 
restriction: (1) Recent data indicate that at least ir. some 
environments (space) the hazard rates are far front constant. 

(2) The hazard rates associated with modules having internal 
redundancy are not constant even if the individual component 
hazard rates are. 

•f- 7. Transient faults 

Rationale: Most faults are modeled either as permanent 

intermittent, the latter actually being permanent faults 
that manifest themselves interr.ittently . Some faults may 
well be transient in nature, however; e.g., faults due to 
noise or those due to improperly validated software. In such 
cases, no hardware damage has occurred ar.d, as soon as the 
cause of the fault disappears, the system can, in principle, 
function as before. 

8. Non-unity dormanc y factors 

Rationale: Of the four models discussed in paragraph 

2.1, only the FTSC model allowed non-unity dormancy factors. 

In some cases, It is reasonable to assume that dormant (e.g., 
unpowered or inactive) modules may have lower hazard rates 
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Computationally fast tools (less than 24 cpu hours for very large models) 


CARE III EVOLUTION 
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AGENDA 

CARE III Users’ Workshop, October 6-7, 1987 


Co-Chairmen: Salvatore .). Bavuso, NASA Langley Research Center 
Anna L. Martensen, PRC Kentron, Inc. 


Tuesday 

Welcome: Sal Bavuso, Workshop Co-Chairman Division Representative; 

Chuck Meissner, Branch Head SVMB 
Introduction to the CARE III Workshop: Sal Bavuso 
Introduction to the CARE III Mathematical Model: Roberto E. Altschul 
The CARE III Implementation and Code: Anna L. Martensen 
RTI’s Use of CARE III: Charlotte Scheper 
USE of CARE III for Flight Controls Development 
at Northrop: Jack Flynn 


Wednesday 

Questionnaire Review and Discussion: Sal Bavuso 
CARE III Model User’s Overview: John Sight 
Examples of Nonconservative Reliability Estimates 
(liven by CARE III: Kelly J. Dotson 
Comparison of Tools (CARE III, SURE, HARP): Anna L. Martensen 
Demonstrations in AIRLAB: 

Overview: Chuck Meissner 

The Semi-Markov Unreliability Range Evaluator (SURE): Ricky Butl 
Fault Injection: George Finelli 
Software Reliability: Jon Sjogren 
HARP: Sal Bavuso 

CARE III and HARP Hands-On Demonstrations and Tutorials 
Wei bill I References 



CARE m USERS’ WORKSHOP ATTENDEES 
AND THEIR ADDRESSES 

October 6-7, 1987 


Name 

Job Title 

Address 

Charlotte Scheper 
(919) 541-7116 

Comp. Scientist 

Research Triangle Institute 
PO Box 12194 

Research Triangle Park, NC 27709 

Jack Flynn 
(213) 940-5076 

Sr. Tech. Spec. 

Northrop Corporation 
E294/6A 

8900 E. Washington Blvd. 
Pico Rivera, CA 90660 

Don Lee 
(213) 366-4366 

MTS 

The Aerospace Corp. 
M/S Ml/166 
2350 El Segundo Blvd. 
El Segundo, CA 90245 

Jocelyn Frosch 
(817) 763-3278 

Engineer 

i 

General Dynamics 
Ft. Worth Division 
PO Box 748 
MZ 2660 

Ft. Worth, TX 76101 

Fred Swern 
(201) 420-5582 

Professor 

Stevens Institute of Technology 
Dept, of Mechanical Engineering 
Hoboken, NJ 07030 

Lori Bechtold 
(206) 773-8613 

Engineer 

Boeing Aerospace Co. 
M/S 82-15 
PO Box 3999 
Seattle, WA 98124-2499 

Ha Vuong 

Engineer 

Boeing Aerospace Co. 
M/S 82-15 
PO Box 3999 
Seattle, WA 98124-2499 

David DeLorm 
(617) 276-2517 

Rel. Eng. 

ITEK Optical Systems 
10 Maguire Rd 
Lexington, MA 02173 
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Name 

Job Title 

Address 

Robert P. Landstrom 
(617) 440-2019 

Sr. Eng. 

Raytheon Co., Equipment Div. 
528 Boston Post Rd. 

Sudbury, MA 01776 

Don Livaccari 
(516)346-2270 

Sr. Eng. 

Grumman Space Systems 
M/S A02-105 
Bethpage, NY 11714 

Wah Ng 
(516) 346-2887 

Rel. Eng. 

Grumman ASD 
M/S K03-14 
Bethpage, NY 11714 

Peter Yip 
(516) 346-2888 

Rel. Eng. 

Grumman ASD 
M/S K03-16 
Bethpage, NY 11714 

Jacob Shuker 

Sen. Eng. 

Grumman Space Systems Div. 
M/S A02-105 
Bethpage, NY 11714 

Johnny Sight 
(213) 332-0367 

Engineer II 

Northrop Corporation Aircraft Div. 

M/S 1834/90 

One Northrop Ave. 

Hawthorne, CA 90250 

James F. Eck 
(213) 594-3218 

MTS 6 

Rockwell International 
2600 Westminster Blvd. 
Mail Code SK54 
Seal Beach, CA 90740-7644 

Robert Villet 
(213) 594-3218 

MTS 6 

Rockwell International 

Box 3644 

Mail Code SK54 

Seal Beach, CA 90740-7644 

Kurt A. Liebel 
(602) 869-2837 

Sen. Proj. Eng. 

Honeywell, Inc. 

Sperry Comm. Fit. Sys. Group 
PO Box 2111, MS 020C4 
Phoenix, AZ 85036 

Roberto E. Altschul 
(206) 865-3031 

Res. Eng. 

Boeing Electronics Co. 
PO Box 24969 
MS 7J-27 

Seattle, WA 98124-6269 

Jerry Bilyk 

Unit Chief 

McDonnell-Douglas Corp. 
Box 516 

Bid. 065, L4W, 403 
St. Louis, MO 63166 
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